AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Neural Information Processing SystemsNov-21-2025, 14:47:26 GMT

Dilated Recurrent Neural Networks

architecture, dilated recurrent neural network, name change, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.64)

Asier Mujika, Florian Meier, Angelika Steger

Fast-Slow Recurrent Neural Networks

Neural Information Processing SystemsNov-21-2025, 13:27:23 GMT

The FS-RNN incorporates the strengths of both multiscale RNNs and deep transition RNNs as it processes sequential data on different timescales and learns complex transition functions from one time step to the next.

artificial intelligence, deep learning, machine learning, (14 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Danieli, Federico, Rodriguez, Pau, Sarabia, Miguel, Suau, Xavier, Zappella, Luca

ParaRNN: Unlocking Parallel Training of Nonlinear RNNs for Large Language Models

arXiv.org Artificial IntelligenceNov-4-2025

Since its introduction by Vaswani et al. (2023), the Transformer architecture has quickly imposed itself as the de-facto choice for sequence modeling, surpassing previous state-of-the-art, RNN-based models such as GRU and LSTM (Cho et al., 2014; Hochreiter & Schmidhuber, 1997). One key reason behind the rapid adoption of Transformers lies in the efficiency of their application at training time: their core sequence mixer, the attention mechanism, can in fact be applied in parallel along the length of the input sequence. This effectively overcomes one main limitation of classical RNNs, whose application must be unrolled sequentially along the input sequence. In more recent times, however, interest in RNNs has been rekindled, largely due to their reduced memory footprint and improved efficiency at inference time. In particular, recent advancements in State Space Models (SSMs) such as Mamba (Gu & Dao, 2023; Dao & Gu, 2024) have started gaining popularity and are imposing themselves as a potential alternative to Transformers, at least for small-to-mid-sized models (Zuo et al., 2024). To grant parallelization during training (and hence a comparable performance to attention), SSMs simplify the recurrence relationship at their core to one that is purely linear in the hidden state. This simplification enables leveraging associativity and parallel reduction operations to quickly compute the output of the application of an SSM to a whole input sequence in parallel along its length. Despite the success of modern SSMs, the linearity constraint remains a limitation which hinders their expressive power (Merrill et al., 2025; Cirone et al., 2025), and is dictated by necessity rather than choice. With ParaRNN, we aim to overcome the constraint of linearity for SSMs, while unlocking parallelization also for nonlinear RNNs, thus enriching the space of viable options for sequence modeling.

artificial intelligence, machine learning, natural language, (19 more...)

2510.2145

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Khandoker, Shoummo Ahsan, Inack, Estelle M., Hibat-Allah, Mohamed

Lattice Protein Folding with Variational Annealing

arXiv.org Artificial IntelligenceFeb-27-2025

Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.

probability, protein, sequence, (15 more...)

2502.20632

Country:

North America > Canada > Ontario (0.05)
North America > United States > Indiana > Monroe County > Bloomington (0.04)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.88)

Asier Mujika, Florian Meier, Angelika Steger

Fast-Slow Recurrent Neural Networks

Neural Information Processing SystemsOct-4-2024, 08:58:18 GMT

Processing sequential data of variable length is a major challenge in a wide range of applications, such as speech recognition, language modeling, generative image modeling and machine translation. Here, we address this challenge by proposing a novel recurrent neural network (RNN) architecture, the Fast-Slow RNN (FS-RNN). The FS-RNN incorporates the strengths of both multiscale RNNs and deep transition RNNs as it processes sequential data on different timescales and learns complex transition functions from one time step to the next. We evaluate the FS-RNN on two character level language modeling data sets, Penn Treebank and Hutter Prize Wikipedia, where we improve state of the art results to 1.19 and 1.25 bits-per-character (BPC), respectively. In addition, an ensemble of two FS-RNNs achieves 1.20 BPC on Hutter Prize Wikipedia outperforming the best known compression algorithm with respect to the BPC measure. We also present an empirical investigation of the learning and network dynamics of the FS-RNN, which explains the improved performance compared to other RNN architectures. Our approach is general as any kind of RNN cell is a possible building block for the FS-RNN architecture, and thus can be flexibly applied to different tasks.

architecture, arxiv preprint arxiv, long-term dependency, (11 more...)

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Khandoker, Shoummo Ahsan, Abedin, Jawaril Munshad, Hibat-Allah, Mohamed

Supplementing Recurrent Neural Networks with Annealing to Solve Combinatorial Optimization Problems

arXiv.org Artificial IntelligenceOct-26-2023

Combinatorial optimization problems can be solved by heuristic algorithms such as simulated annealing (SA) which aims to find the optimal solution within a large search space through thermal fluctuations. The algorithm generates new solutions through Markov-chain Monte Carlo techniques. This sampling scheme can result in severe limitations, such as slow convergence and a tendency to stay within the same local search space at small temperatures. To overcome these shortcomings, we use the variational classical annealing (VCA) framework [1] that combines autoregressive recurrent neural networks (RNNs) with traditional annealing to sample solutions that are uncorrelated. In this paper, we demonstrate the potential of using VCA as an approach to solving real-world optimization problems. We explore VCA's performance in comparison with SA at solving three popular optimization problems: the maximum cut problem (Max-Cut), the nurse scheduling problem (NSP), and the traveling salesman problem (TSP). For all three problems, we find that VCA outperforms SA on average in the asymptotic limit by one or more orders of magnitude in terms of relative error. Interestingly, we reach large system sizes of up to 256 cities for the TSP. We also conclude that in the best case scenario, VCA can serve as a great alternative when SA fails to find the optimal solution.

constraint, optimization problem, vca, (16 more...)

doi: 10.1088/2632-2153/acb895

2207.08189

Country:

Europe > France > Hauts-de-France > Nord > Lille (0.04)
Asia > Japan (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Health Care Providers & Services (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-11-2023

DartsReNet: Exploring new RNN cells in ReNet architectures

Moser, Brian, Raue, Federico, Hees, Jörn, Dengel, Andreas

We present new Recurrent Neural Network (RNN) cells for image classification using a Neural Architecture Search (NAS) approach called DARTS. We are interested in the ReNet architecture, which is a RNN based approach presented as an alternative for convolutional and pooling steps. ReNet can be defined using any standard RNN cells, such as LSTM and GRU. One limitation is that standard RNN cells were designed for one dimensional sequential data and not for two dimensions like it is the case for image classification. We overcome this limitation by using DARTS to find new cell designs. We compare our results with ReNet that uses GRU and LSTM cells. Our found cells outperform the standard RNN cells on CIFAR-10 and SVHN. The improvements on SVHN indicate generalizability, as we derived the RNN cell designs from CIFAR-10 without performing a new cell search for SVHN.

artificial intelligence, machine learning, rnn cell, (18 more...)

doi: 10.1007/978-3-030-61609-0_67

2304.05838

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Germany > Rhineland-Palatinate > Kaiserslautern (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Song, Chang hoon, Hwang, Geonho, Lee, Jun ho, Kang, Myungjoo

Minimal Width for Universal Property of Deep RNN

arXiv.org Artificial IntelligenceMar-28-2023

A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data. Imitating a dynamical system, an infinite-width RNN can approximate any open dynamical system in a compact domain. In general, deep networks with bounded widths are more effective than wide networks in practice; however, the universal approximation theorem for deep narrow structures has yet to be extensively studied. In this study, we prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data. Specifically, we show that a deep RNN with ReLU activation can approximate any continuous function or $L^p$ function with the widths $d_x+d_y+2$ and $\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in $\mathbb{R}^{d_y}$. We also compute the additional width required if the activation function is $\tanh$ or more. In addition, we prove the universality of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer perceptron and an RNN, our theory and proof technique can be an initial step toward further research on deep RNNs.

artificial intelligence, machine learning, rnn, (18 more...)

2211.13866

Country: Asia > South Korea > Seoul > Seoul (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

#artificialintelligenceDec-24-2022, 07:00:15 GMT

Building A Recurrent Neural Network From Scratch In Python

We can think of the RNN model shown in figure 1 as a repeated use of a single cell shown in figure 2. First, we will implement a single cell and then we can loop through it to stack multiple of these single cells over each other and create the forward pass of the RNN model. The basic RNN cell takes as input? Let's implement the RNN cell shown in figure 2 through these four main steps: Let's first implement the softmax activation function:

python, recurrent neural network, single cell, (3 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)